基于最近邻法和集成学习的小样本缺失环境试验数据预测模型

周俊炎, 贺琼瑶, 周磊, 朱洋永, 王竟成, 李永, 舒畅, 胡广洋

装备环境工程 ›› 2025, Vol. 22 ›› Issue (11) : 160-168.

PDF(628 KB)
PDF(628 KB)
装备环境工程 ›› 2025, Vol. 22 ›› Issue (11) : 160-168. DOI: 10.7643/ issn.1672-9242.2025.11.017
环境试验与观测

基于最近邻法和集成学习的小样本缺失环境试验数据预测模型

  • 周俊炎1, 贺琼瑶1, 周磊2, 朱洋永2, 王竟成1, 李永1, 舒畅1, 胡广洋1
作者信息 +

Small-sample Missing Environmental Test Data Prediction Model Based on k-Nearest Neighbors and Ensemble Learning

  • ZHOU Junyan1, HE Qiongyao1, ZHOU Lei2, ZHU Yangyong2, WANG Jingcheng1, LI Yong1, SHU Chang1, HU Guangyang1
Author information +
文章历史 +

摘要

目的 利用小样本缺失的环境试验数据构建精准预测模型。方法 基于阈值和格鲁布斯检验法进行异常值筛选,再融合环境试验数据特点,与最近邻算法进行缺失数据填充,通过配对双样本均值T检验判断是否发生显著性变化,最后通过bagging集成学习实现小样本缺失环境试验数据预测。结果 以某型橡胶压缩变形率实验室环境加速数据为对象,弱评估器预测误差14.8%~41.6%,集成学习模型预测误差6.4%,显著优于基准方法。结论 基于最近邻法的缺失环境试验数据填充解决了小样本数据缺失的问题,基于集成学习的环境试验数据预测形成了强泛化能力、高鲁棒性的模型,研究思路为复杂环境试验低质量数据建模提供了新范式。

Abstract

To construct an accurate prediction model with small-sample missing environmental test data. The threshold criteria and Grubbs' test were adopted to detect and eliminate outliers. Subsequently, the unique characteristics of environmental test data were integrated with a nearest-neighbors algorithm to impute missing values. A paired two-sample t-test was then employed to verify significant changes. Finally, bagging ensemble learning was implemented to predict small-sample missing environmental test data. With the laboratory environmental acceleration data of the compression set rate of a certain type of rubber as the research object, the prediction error of the weak learner ranged from 14.8% to 41.6%, while the prediction error of the ensemble learning model was 6.4%, which was significantly superior to the benchmark method.The imputation of missing environmental test data based on the k-nearest neighbors solves the problem of small-sample data missing, while the prediction of environmental test data based on ensemble learning yields a model with strong generalization ability and high robustness. This research idea provides a new paradigm for modeling low-quality data in complex environmental tests.

关键词

小样本 / 环境试验 / 缺失数据 / 预测模型 / 最近邻法 / 集成学习

Key words

small sample / environmental test / missing data / prediction model / k-nearest neighbors / ensemble learning

引用本文

导出引用
周俊炎, 贺琼瑶, 周磊, 朱洋永, 王竟成, 李永, 舒畅, 胡广洋. 基于最近邻法和集成学习的小样本缺失环境试验数据预测模型[J]. 装备环境工程. 2025, 22(11): 160-168 https://doi.org/10.7643/ issn.1672-9242.2025.11.017
ZHOU Junyan, HE Qiongyao, ZHOU Lei, ZHU Yangyong, WANG Jingcheng, LI Yong, SHU Chang, HU Guangyang. Small-sample Missing Environmental Test Data Prediction Model Based on k-Nearest Neighbors and Ensemble Learning[J]. Equipment Environmental Engineering. 2025, 22(11): 160-168 https://doi.org/10.7643/ issn.1672-9242.2025.11.017
中图分类号: TP391   

参考文献

[1] 周阳红生, 张洪彬, 薛海红, 等. 我国综合环境试验现状与发展建议[J]. 装备环境工程, 2018, 15(5): 44-47.
ZHOU Y H S, ZHANG H B, XUE H H, et al. Current Situations and Development Suggestions of Combined Environmental Test in China[J]. Equipment Environmental Engineering, 2018, 15(5): 44-47.
[2] 全国汽车标准化技术委员会. 汽车整车大气暴露试验方法: GB/T 40512—2021[S]. 北京: 中国标准出版社, 2021. National Technical Committee for Automobile Standardization. Test Method of Exposure to Weathering for Motor Vehicle: GB/T 40512—2021[S]. Beijing: Standards Press of China, 2021.
[3] 骆晨, 孙志华, 刘明, 等. 航空材料及产品环境试验数据资源需求分析[J]. 科技导报, 2021, 39(9): 17-23.
LUO C, SUN Z H, LIU M, et al.Analysis of Aeronautical Materials' and Products' Demand for Environmental Testing Data Resource[J]. Science & Technology Review, 2021, 39(9): 17-23.
[4] 国家质量监督检验检疫总局, 中国国家标准化管理委员会. 金属和合金的腐蚀黑箱暴露试验方法: GB/T 31317—2014[S]. 北京: 中国标准出版社, 2015.
General Administration of Quality Supervision, Inspection and Quarantine of the People’s Republic of China, Standardization Administration of the People’s Republic of China. Corrosion of Metals and Alloys—Black Box Exposure Test Method: GB/T 31317—2014[S]. Beijing: Standards Press of China, 2015.
[5] 张伦武, 周堃, 谭甜甜, 等. 装备自然环境试验数据工程建设方法与实践[J]. 装备环境工程, 2025, 22(1): 186-195.
ZHANG L W, ZHOU K, TAN T T, et al.Methods and Practices for Constructing Equipment Natural Environment Test Data Engineering[J]. Equipment Environmental Engineering, 2025, 22(1): 186-195.
[6] 周俊炎, 王竟成, 赵方超, 等. 自然环境试验元数据体系研究[J]. 装备环境工程, 2024, 21(4): 156-164.
ZHOU J Y, WANG J C, ZHAO F C, et al.Metadata System of Natural Environment Experiments[J]. Equipment Environmental Engineering, 2024, 21(4): 156-164.
[7] 陈凯诺, 张福光, 韩建立, 等. 基于Transformer小样本多源数据融合的装备剩余寿命预测评估[J]. 装备环境工程, 2024, 21(11): 65-73.
CHEN K N, ZHANG F G, HAN J L, et al.Equipment Residual Life Prediction and Evaluation with Transformer Small Sample Multi-Source Data Fusion[J]. Equipment Environmental Engineering, 2024, 21(11): 65-73.
[8] 杨小奎, 张伦武, 张世艳, 等. 塑料大气环境老化预测模型研究[J]. 装备环境工程, 2019, 16(3): 30-36.
YANG X K, ZHANG L W, ZHANG S Y, et al.Aging Prediction Model of Plastic Exposed in Atmosphere Environments[J]. Equipment Environmental Engineering, 2019, 16(3): 30-36.
[9] 秦术杰, 王欣, 曹宝珠, 等. 钢材腐蚀行为的研究进展[J]. 海南大学学报(自然科学版), 2023, 41(1): 104-114.
QIN S J, WANG X, CAO B Z, et al.Research Review on Corrosion Behavior of Steel[J]. Natural Science Journal of Hainan University, 2023, 41(1): 104-114.
[10] 安江峰. 材料自然失效数据平台开发与失效数据挖掘[D]. 北京: 机械科学研究总院, 2022.
AN J F.Development of Natural Failure Data Platform of Materials and Failure Data Mining[D]. Beijing: General Research Institute for Machinery Science, 2022.
[11] 郭成英, 韩玉, 陶梦江, 等. 基于自适应支持向量机的变压器温度异常检测方法[J]. 环境技术, 2024, 42(4): 77-83.
GUO C Y, HAN Y, TAO M J, et al.Transformer Temperature Anomaly Detection Method Based on Adaptive Support Vector Machine[J]. Environmental Technology, 2024, 42(4): 77-83.
[12] 肖树臣, 秦玉勋, 韩吉庆. 基于格拉布斯法的试验数据分析方法[J]. 弹箭与制导学报, 2007, 27(1): 275-277.
XIAO S C, QIN Y X, HAN J Q.A Test Analysis Method Based on Grubbs in Experiments[J]. Journal of Projectiles, Rockets, Missiles and Guidance, 2007, 27(1): 275-277.
[13] 张颖, 高灵君. 基于格拉布斯准则和改进粒子滤波算法的水下传感网目标跟踪[J]. 电子与信息学报, 2019, 41(10): 2294-2301.
ZHANG Y, GAO L J.Target Tracking with Underwater Sensor Networks Based on Grubbs Criterion and Improved Particle Filter Algorithm[J]. Journal of Electronics & Information Technology, 2019, 41(10): 2294-2301.
[14] 刘岩, 钊煜鹏, 刘凤庆, 等. 海洋环境监测数据异常及缺失处理方法研究进展[J]. 应用海洋学学报, 2025, 44(2): 388-401.
LIU Y, ZHAO Y P, LIU F Q, et al.Research Progress on Methods for Handling Abnormal and Missing Data in Marine Environmental Monitoring[J]. Journal of Applied Oceanography, 2025, 44(2): 388-401.
[15] 王燚烊, 王瑞福, 武建辉. 大气PM2.5中多环芳烃浓度缺失值填补方法的研究[J]. 中国卫生统计, 2019, 36(6): 878-882.
WANG Y Y, WANG R F, WU J H.Study on the Method of Filling the Missing Value of Polycyclic Aromatic Hydrocarbons Concentration in Atmospheric PM2.5[J]. Chinese Journal of Health Statistics, 2019, 36(6): 878-882.
[16] 周俊炎, 王竟成, 杨小奎, 等. 基于GAN的小样本腐蚀失厚率数据增强方法[J]. 装备环境工程, 2023, 20(1): 142-150.
ZHOU J Y, WANG J C, YANG X K, et al.Corrosion Thickness Loss Rate Data Enhancement Based on a Small Sample of GAN[J]. Equipment Environmental Engineering, 2023, 20(1): 142-150.
[17] 张子怡. 优化k近邻算法对温室大棚环境缺失数据的填补研究[D]. 长春: 吉林农业大学, 2022.
ZHANG Z Y.Study on Filling Missing Data in Greenhouse Environment by Optimized k Nearest Neighbor Algorithm[D]. Changchun: Jilin Agricultural University, 2022.
[18] 刘然. 结合改进最近邻法与支持向量机的住宅用电负荷识别研究[D]. 重庆: 重庆大学, 2014.
LIU R.Research on Residential Electricity Load Identification Based on Improved Nearest Neighbor Method and Support Vector Machine[D]. Chongqing: Chongqing University, 2014.
[19] 贾方文. 显著性检验在工业测量评价中的应用[J]. 工业计量, 2023, 33(S1): 131-133.
JIA F W.Application of Significance Test in Industrial Measurement Evaluation[J]. Industrial Metrology, 2023, 33(S1): 131-133.
[20] 陈钧, 刘国荣, 曾博, 等. t检验和方差分析识别EMC测量系统间性能偏离[J]. 环境技术, 2023, 41(7): 80-83.
CHEN J, LIU G R, ZENG B, et al.T Test and Variance Analysis Identify Performance Deviation between EMC Measurement Systems[J]. Environmental Technology, 2023, 41(7): 80-83.
[21] 马亮, 郭力强, 刘丙杰, 等. 基于集成学习的装备小样本试验缺失数据插补方法研究[J]. 计算机测量与控制, 2022, 30(8): 116-121.
MA L, GUO L Q, LIU B J, et al.Research about Interpolating Missing Data on Small Sample Trials of Equipment Based on Ensemble Learning[J]. Computer Measurement & Control, 2022, 30(8): 116-121.

PDF(628 KB)

Accesses

Citation

Detail

段落导航
相关文章

/